AITopics | source population

Collaborating Authors

source population

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Augmented transfer regression learning for completely missing covariates

Zhao, Huali, Wang, Tianying

arXiv.org Machine LearningMay-7-2026

Large-scale population-level datasets, such as the UK Biobank and the All of Us Research Program, often lack covariates needed for a specific analysis, such as genetic or lifestyle measures, while related studies measure them. This creates a cross-population missing data problem in which covariates are completely unobserved in the target population, rather than partially missing within one dataset. We propose an augmented transfer regression learning method for this setting. The key identifying condition is a sub-population shift assumption: the joint distribution of the outcome and observed covariates may differ across source and target populations, but the conditional distribution of the missing covariates given observed variables is invariant. We combine importance-weighted estimating equations with imputation terms for first- and second-order moments of the missing covariates. The resulting estimator is doubly robust, remaining consistent if either the density ratio model or both imputation models are correctly specified. It is $n^{1/2}$-consistent and asymptotically normal, and attains the semiparametric efficiency bound when both nuisance models are correctly specified.

artificial intelligence, machine learning, target population, (18 more...)

arXiv.org Machine Learning

2605.04469

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.69)

Industry:

Health & Medicine > Consumer Health (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Debiased Machine Learning for Conformal Prediction of Counterfactual Outcomes Under Runtime Confounding

Barnatchez, Keith, Josey, Kevin P., Nethery, Rachel C., Parmigiani, Giovanni

arXiv.org Machine LearningApr-7-2026

Data-driven decision making frequently relies on predicting counterfactual outcomes. In practice, researchers commonly train counterfactual prediction models on a source dataset to inform decisions on a possibly separate target population. Conformal prediction has arisen as a popular method for producing assumption-lean prediction intervals for counterfactual outcomes that would arise under different treatment decisions in the target population of interest. However, existing methods require that every confounding factor of the treatment-outcome relationship used for training on the source data is additionally measured in the target population, risking miscoverage if important confounders are unmeasured in the target population. In this paper, we introduce a computationally efficient debiased machine learning framework that allows for valid prediction intervals when only a subset of confounders is measured in the target population, a common challenge referred to as runtime confounding. Grounded in semiparametric efficiency theory, we show the resulting prediction intervals achieve desired coverage rates with faster convergence compared to standard methods. Through numerous synthetic and semi-synthetic experiments, we demonstrate the utility of our proposed method.

artificial intelligence, machine learning, prediction interval, (17 more...)

arXiv.org Machine Learning

2604.03772

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Covariate Shift Corrected Conditional Randomization Test

Neural Information Processing SystemsNov-19-2025, 21:03:31 GMT

To address this issue, we propose a novel Covariate Shift Corrected Pearson Chi-squared Conditional Randomization (csPCR) test.

artificial intelligence, machine learning, target population, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Covariate Shift Corrected Conditional Randomization Test

Neural Information Processing SystemsOct-10-2025, 09:20:32 GMT

To address this issue, we propose a novel Covariate Shift Corrected Pearson Chi-squared Conditional Randomization (csPCR) test.

covariate shift, density ratio, target population, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

LEARNER: A Transfer Learning Method for Low-Rank Matrix Estimation

McGrath, Sean, Zhu, Cenhao, Guo, Min, Duan, Rui

arXiv.org Machine LearningDec-29-2024

Low-rank matrix estimation is a fundamental problem in statistics and machine learning. In the context of heterogeneous data generated from diverse sources, a key challenge lies in leveraging data from a source population to enhance the estimation of a low-rank matrix in a target population of interest. One such example is estimating associations between genetic variants and diseases in non-European ancestry groups. We propose an approach that leverages similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population, which we refer to as LatEnt spAce-based tRaNsfer lEaRning (LEARNER). LEARNER is based on performing a low-rank approximation of the target population data which penalizes differences between the latent row and column spaces between the source and target populations. We present a cross-validation approach that allows the method to adapt to the degree of heterogeneity across populations. We conducted extensive simulations which found that LEARNER often outperforms the benchmark approach that only uses the target population data, especially as the signal-to-noise ratio in the source population increases. We also performed an illustrative application and empirical comparison of LEARNER and benchmark approaches in a re-analysis of a genome-wide association study in the BioBank Japan cohort. LEARNER is implemented in the R package learner.

artificial intelligence, machine learning, target population, (17 more...)

arXiv.org Machine Learning

2412.20605

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > Experimental Study (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.71)

Add feedback

A Transfer Learning Causal Approach to Evaluate Racial/Ethnic and Geographic Variation in Outcomes Following Congenital Heart Surgery

Han, Larry, Zhang, Yi, Nathan, Meena, Mayer,, John E. Jr., Pasquali, Sara K., Zelevinsky, Katya, Duan, Rui, Normand, Sharon-Lise T.

arXiv.org Machine LearningMar-21-2024

Congenital heart defects (CHD) are the most prevalent birth defects in the United States and surgical outcomes vary considerably across the country. The outcomes of treatment for CHD differ for specific patient subgroups, with non-Hispanic Black and Hispanic populations experiencing higher rates of mortality and morbidity. A valid comparison of outcomes within racial/ethnic subgroups is difficult given large differences in case-mix and small subgroup sizes. We propose a causal inference framework for outcome assessment and leverage advances in transfer learning to incorporate data from both target and source populations to help estimate causal effects while accounting for different sources of risk factor and outcome differences across populations. Using the Society of Thoracic Surgeons' Congenital Heart Surgery Database (STS-CHSD), we focus on a national cohort of patients undergoing the Norwood operation from 2016-2022 to assess operative mortality and morbidity outcomes across U.S. geographic regions by race/ethnicity. We find racial and ethnic outcome differences after controlling for potential confounding factors. While geography does not have a causal effect on outcomes for non-Hispanic Caucasian patients, non-Hispanic Black patients experience wide variability in outcomes with estimated 30-day mortality ranging from 5.9% (standard error 2.2%) to 21.6% (4.4%) across U.S. regions.

non-hispanic black patient, norwood procedure, target population, (13 more...)

arXiv.org Machine Learning

2403.14573

Country:

North America > United States > Texas (0.14)
North America > United States > Georgia (0.14)
North America > United States > Michigan (0.04)
(47 more...)

Genre: Research Report > Experimental Study (0.90)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Surgery (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption

Zhang, Yuhang, Liu, Yue, Zhang, Zhihua

arXiv.org Machine LearningSep-28-2023

The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.

assumption, source population, target population, (13 more...)

arXiv.org Machine Learning

2309.16409

Country:

North America > Mexico (0.04)
Asia > India (0.04)
Asia > China > Beijing > Beijing (0.04)
(8 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

Add feedback

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Qiu, Hongxiang, Tchetgen, Eric Tchetgen, Dobriban, Edgar

arXiv.org Machine LearningJun-29-2023

Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.

artificial intelligence, condition ds, machine learning, (16 more...)

arXiv.org Machine Learning

2306.16406

Country:

North America > United States > New York > New York County > New York City (0.14)
Africa > South Africa (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)

Add feedback

Prediction Sets Adaptive to Unknown Covariate Shift

Qiu, Hongxiang, Dobriban, Edgar, Tchetgen, Eric Tchetgen

arXiv.org Machine LearningJun-17-2023

Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift -- a prevalent issue in practice -- poses a serious unsolved challenge. In this paper, we show that prediction sets with finite-sample coverage guarantee are uninformative and propose a novel flexible distribution-free method, PredSet-1Step, to efficiently construct prediction sets with an asymptotic coverage guarantee under unknown covariate shift. We formally show that our method is \textit{asymptotically probably approximately correct}, having well-calibrated coverage error with high confidence for large samples. We illustrate that it achieves nominal coverage in a number of experiments and a data set concerning HIV risk prediction in a South African cohort study. Our theory hinges on a new bound for the convergence rate of the coverage of Wald confidence intervals based on general asymptotically linear estimators.

artificial intelligence, machine learning, prediction, (20 more...)

arXiv.org Machine Learning

2203.06126

Country:

Africa > South Africa (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
(3 more...)

Add feedback

Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Pavlović, Milena, Hajj, Ghadi S. Al, Kanduri, Chakravarthi, Pensar, Johan, Wood, Mollie, Sollid, Ludvig M., Greiff, Victor, Sandve, Geir Kjetil

arXiv.org Artificial IntelligenceApr-3-2023

Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations.

artificial intelligence, batch effect, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2204.09291

Country:

Europe > Norway > Eastern Norway > Oslo (0.05)
Europe > Serbia (0.04)
North America > United States > North Carolina (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback